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Abstract. Our study assessed the performance of two Grammar Checkers (GCs), 
Grammarly and Virtual Writing Tutor, and the grammar checking function in 
Microsoft Word on a broad range of grammatical errors. The errors occurred in both 
authentic English as a Second Language (ESL) compositions and simple sentences 
we generated ourselves. We verified the performance in terms of (1) coverage 
(rates of error detection), (2) accuracy of proposed replacement forms, and (3) 
‘false alarms’ (forms mistakenly flagged as incorrect). To the extent GCs provide 
accurate and comprehensive corrective feedback, they could relieve teachers of the 
time-consuming task of providing written feedback themselves. While inaccurate 
replacement forms and false alarms are relatively rare, we found GCs to have poor 
overall coverage (total error detection rates under 50%). Grammarly and Virtual 
Writing Tutor, however, outperform Microsoft Word. Coverage is also higher 
both for certain categories of error and for the sentences rather than the authentic 
compositions. Finally, although GCs do not provide comprehensive feedback, we 
suggest designing special activities that target select error types. 


Keywords: grammar checkers, corrective feedback, focus-on-form, second 
language learning. 


1. Introduction 


Our study investigates the adequacy of automatic corrective feedback from GCs 
to determine their possible use in the ESL classroom. Written corrective feedback 
permits teachers to incorporate a focus on form into the communicative classroom, 
thereby promoting accuracy and preventing fossilization (Bitchener, 2008; Ferris, 
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Liu, Sinha, & Senna, 2013). Still, providing feedback is time-consuming, so the 
potential for GCs to relieve teachers’ workloads is appealing. In essence, GCs look 
like an invaluable tool for the ESL context. 


Important questions remain, however, concerning the quality of automatic corrective 
feedback. Previous studies have often adopted a narrow focus, evaluating GCs only 
on articles/determiners, prepositions, and collocations (De Felice & Pulman, 2008; 
Han, Chodorow, & Leacock, 2006). Research on the grammar checking function 
in automated writing evaluation systems has been more comprehensive (Dikli & 
Bleyle, 2014, on Criterion), but these systems are prohibitively expensive. In our 
view, an investigation of GCs available for little or no cost and on a wide range 
of grammatical issues is overdue. The current study thus addresses the following 
research questions: 


¢ To what extent is automatic corrective feedback comprehensive and 
accurate? 


¢ Do GCs perform better on certain grammar points than others? 


2. Method 


2.1. Data collection 


We evaluated two leading online GCs (Grammarly and Virtual Writing Tutor) and 
the grammar checking function in Microsoft Word on errors from two sources: (1) 
authentic compositions (50 handwritten essays generated under exam conditions 
by 28 francophone TESL? students at a university in Quebec; 10M /18F; age 21- 
36); and (2) a set of 129 simple sentences containing errors we generated based on 
our knowledge of typical francophone errors. 


Representative errors were selected from the compositions, and these errors and 
the simple sentences were run through the three GCs to verify coverage (error 
detection rates) and accuracy of proposed replacement forms. The 50 compositions 
and 129 sentences were then run through the GCs to establish rates of ‘false alarms’ 
(forms mistakenly flagged as incorrect). 
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2.2. Results 


Table | shows how the GCs performed on the two sets of errors (compositions vs. 
simple sentences) in the different grammatical categories listed on the left. The 
results are presented as fractions, such that 2/4, for example, indicates that the GC 
identified two out of four errors (Gram=Grammarly; VWT=Virtual Writing Tutor). 
Though many of the error categories are self-evident, others may be elusive. By 
‘tense shift’, we mean shifts primarily between past and present in contexts where 
either is acceptable. The category ‘plural nouns’ refers to failure to pluralize a noun 
or pluralization of a non-count noun. Possessive errors involve inappropriate use 
of either apostrophe + ‘s’ or the periphrastic possessive with ‘of’. Pronoun errors 
concern incorrect reference. The category ‘relative clauses’ refers to incorrect 
comma usage with restrictive and non-restrictive relative clauses. 


Table 1. Rates of error detection: compositions vs. simple sentences 


Grammatical Compositions Sentences 
categories Word Gram | VWT Word Gram | VWT 
Tense-aspect 2/4 1/4 2/4 1/9 4/9 0/9 
» Verb form 1/3 3/3 2/3 2/13 8/13 8/13 
5 Subj-V agreement | 0/3 3/3 0/3 0/6 6/6 6/6 
re Tense shift 0/6 0/6 0/6 0/2 0/2 0/2 
Total 3/16 7/16 4/16 3/30 18/30 14/30 
Plural 1/3 3/3 3/3 4/20 11/20 11/20 
© Possessive 0/5 3/5 2/5 0/4 0/4 0/4 
Z Pronoun 0/2 0/2 0/2 0/5 2/5 0/5 
Total 1/10 6/10 5/10 4/29 13/29 11/29 
Wrong prep 0/3 1/3 1/3 0/10 8/10 8/10 
a. Missing prep 0/2 0/2 0/2 0/4 2/4 2/4 
e& Unnecessary prep 0/2 0/2 0/2 0/7 3/7 2/7 
Total 0/7 1/7 1/7 0/21 13/21 12/21 
2 Word order 0/3 0/3 0/3 3/18 7/18 3/18 
6 | Word form 0/3 0/3 0/3 6/10 7/10 7/10 
= Total 0/6 0/6 0/6 9/28 14/28 10/28 
# Determiner 0/4 0/4 0/4 1/13 4/13 4/13 
= Relative clause 0/3 0/3 0/3 2/8 1/8 0/8 
Total 0/7 0/7 0/7 3/21 5/21 4/21 
Grand totals 4/46 14/46 10/46 19 63 51 
(8.7%) | (30.4%) | (21.7%) | (14.7%) | (48.8%) | (39.5%) 


The grand totals in Table | indicate poor overall error detection (all below 50%). 
In addition, Microsoft Word achieves considerably lower coverage than the two 
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online GCs, with Grammarly generally outperforming Virtual Writing Tutor: 
hence, Grammarly >> Virtual Writing Tutor >> Microsoft Word. Error detection 
is greater on simple sentences than on compositions. In addition, there are some 
grammatical categories in which Grammarly, and to a degree Virtual Writing Tutor, 
perform better: particularly verb forms, subject-verb agreement and plural nouns. 
They are also strong in the ‘wrong preposition’ and ‘word form’ categories, but 
only with simple sentences. Finally, we can report that incorrect replacement forms 
are rare: we found one inaccurate replacement for Grammarly, three for Virtual 
Writing Tutor and four for Microsoft Word. 


While none of the GCs raised false alarms in the simple sentences, Grammarly 
shows a clear edge over both Virtual Writing Tutor and Microsoft Word for false 
alarms on the compositions (see Table 2). The absence of false alarms on the 
simple sentences is partly due to lack of opportunity (1,055 words in the sentences 
vs. 23,108 words in the compositions). Microsoft Word’s relatively low number of 
false alarms is probably a function of its low rate of error detection. 


Table 2. Rates of false alarms 


Microsoft Word Grammarly Virtual Writing Tutor 
Compositions 13 4 30 
Simple sentences | 0 0 0 


3. Discussion 


We evaluated the performance of two online GCs, Grammarly and Virtual Writing 
Tutor, and the grammar checking function in Microsoft Word on a wide range of 
grammatical errors. The fact that Grammarly and Virtual Writing Tutor clearly 
outperform Microsoft Word in error detection suggests that learners should be 
wary of relying on this omnipresent word processor to check the accuracy of 
their writing. They might instead consider turning to an online GC for a fuller 
picture. 


Nonetheless, Grammarly and Virtual Writing Tutor also show limited coverage — 
which parallels the findings in De Felice and Pulman (2008) and Han et al. (2006). 
An important implication is that ESL teachers cannot truly count on the technology 
to provide comprehensive written corrective feedback on student compositions. 
The fact that error detection rates were higher for the simple sentences than for the 
authentic compositions simply underscores this conclusion. 
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The low rates of inaccurate replacement forms and false alarms are encouraging 
for the ESL context. Inaccurate feedback could lead ESL learners seriously astray, 
particularly since they lack native speaker intuitions to override misleading 
feedback. It is encouraging that GCs perform strongly in some categories of error 
(verb forms, subject-verb agreement, plural nouns, wrong prepositions, and word 
forms). We suggest that teachers use GCs to target specific error types in student 
compositions and encourage students to scrutinize their own writing for errors 
that the GC might have overlooked. Furthermore, teachers can develop special 
activities containing errors that the GCs are capable of identifying. Students can 
first try to identify the errors themselves and then run the text through the GC to 
check their answers. 


4. Conclusions 


While our findings show that GCs have poor overall coverage, Grammarly and 
Virtual Writing Tutor have higher coverage than Microsoft Word. GCs are also 
better at detecting errors in some categories than others and in specially composed 
simple sentences than in authentic compositions. Finally, both inaccurate 
replacement forms and false alarms are infrequent. Thus, though GCs cannot 
provide comprehensive corrective feedback on student compositions, they can be 
employed to target select error types in student writing and in specially developed 
activities alike. In this manner, GCs can be used effectively to incorporate a focus 
on form into the communicative ESL classroom. 
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